A Comparison of Word Frequency and N-Gram Based Vulnerability Categorization Using SOM
نویسندگان
چکیده
Network attackers exploit software vulnerabilities on network computers to facilitate successful attacks. Many organizations keep track of the existing software vulnerabilities in the form of vulnerability databases. However, categorizing vulnerabilities is difficult due to the large number of different attributes maintained. In this work we apply a dataclustering algorithm (SOM) to two different representations of information contained in an existing online vulnerability databases. After identifying the more valuable approach for this task, we are able to identify critical vulnerability features inherent in the dataset.
منابع مشابه
A Comparison of Text-Categorization Methods Applied to N-Gram Frequency Statistics
This paper gives an analysis of multi-class e-mail categorization performance, comparing a character n-gram document representation against a word-frequency based representation. Furthermore the impact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.
متن کاملA Comparison of Support Vector Machines and Self-Organizing Maps for e-Mail Categorization
This paper reports on experiments in multi-class document categorization with support vector machines and self-organizing maps. A data set consisting of personal e-mail messages is used for the experiments. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Based on th...
متن کاملUsing Word Sequences for Text Summarization
Traditional approaches for extractive summarization score/classify sentences based on features such as position in the text, word frequency and cue phrases. These features tend to produce satisfactory summaries, but have the inconvenience of being domain dependent. In this paper, we propose to tackle this problem representing the sentences by word sequences (n-grams), a widely used representati...
متن کاملLanguage-independent text categorization by word N-gram using an automatic acquisition of words
We previously proposed the accumulation method, a language-independent text classification method that is based on character N-grams. The accumulation method does not depend on the language structure because this method uses character N-grams to form
متن کاملImproving Chinese Word Segmentation by Adopting Self-Organized Maps of Character N-gram
Character-based tagging method has achieved great success in Chinese Word Segmentation (CWS). This paper proposes a new approach to improve the CWS tagging accuracy by combining Self-Organizing Map (SOM) with structured support vector machine (SVM) for utilization of enormous unlabeled text corpus. First, character N-grams are clustered and mapped into a low-dimensional space by adopting SOM al...
متن کامل